b'Audit of the Base ERA System\xe2\x80\x99s\n\n   Ability to Ingest Records\n\n\n\n OIG Audit Report No. 13-11\n\n\n\n     September 19, 2013\n\n\x0c                                                                                          OIG Audit Report No. 13-11\n\n\n\nTable of Contents\n\n\nExecutive Summary ........................................................................................ 3\n\n\n\nBackground ..................................................................................................... 5\n\n\n\nObjectives, Scope, Methodology .................................................................... 8\n\n\n\nAudit Results ................................................................................................. 11\n\n\n\nAppendix A \xe2\x80\x93 Acronyms and Abbreviations................................................ 21\n\n\n\nAppendix B \xe2\x80\x93 Management\'s Response to the Report.................................. 22\n\n\n\nAppendix C \xe2\x80\x93 Report Distribution List......................................................... 23\n\n\n\n\n\n                                                     Page 2\n\n                                  National Archives and Records Administration\n\x0c                                                                                     OIG Audit Report No. 13-11\n\n\nExecutive Summary\n\nThe National Archives and Records Administration\xe2\x80\x99s (NARA) Office of Inspector General\n(OIG) completed an audit of the Electronic Records Archives (ERA) System\xe2\x80\x99s 1 ability to ingest\nrecords. Ingest is the process of bringing electronic records into the ERA System including\nphysical transfer of electronic records into ERA. NARA has been developing, testing, and\nrefining the ERA System since 2005. The total cost to develop the system was over $390\nmillion. The estimated annual cost to operate and maintain the ERA System is approximately\n$30 million. We assessed the capability of NARA\xe2\x80\x99s Base ERA System 2 to ingest electronic\nrecords presently and in the near future.\n\nWe found Federal agencies were not using the Base ERA System as envisioned and the system\nlacked the ability to effectively ingest all electronic records. NARA Bulletin 2012-03, issued\nAugust 21, 2012, informed Federal agencies that as of October 1, 2012, NARA will use ERA for\nscheduling records and transferring permanent records. Despite NARA\xe2\x80\x99s guidance, a high\npercentage of agencies have not performed any work in Base ERA.\n\nAs of May 1, 2013 266 agencies received Base ERA training. Of these 266 agencies, 52% have\nnever performed work in Base ERA and only 84 have electronic records ingested into Base ERA.\nFurther, despite NARA\xe2\x80\x99s intent for all agencies to perform the ingest function for themselves\nonline, only four have done so. The remaining 80 agencies relied on NARA to ingest electronic\nrecords on their behalf.\n\nIn addition, from the time it was deployed in June of 2008, through March of 2013, only 5.2 TB\nof electronic records have been transferred into Base ERA. Further, Federal agencies initiated\ningest of only 3.2 TB of the 5.2 TB. The remaining electronic records were migrated by NARA\ninto Base ERA using NARA\xe2\x80\x99s Legacy Archival Preservation System.\n\nTo determine why only four Federal agencies were using the Base ERA System to ingest\nelectronic records for themselves as intended, we asked a NARA official why such a high\npercentage of Federal agencies have not performed work in the system. This official stated\nNARA Processing Archivists are directing agencies not to ingest records themselves online\nbecause agencies typically do not create well-structured, well-understood, \xe2\x80\x9cclean\xe2\x80\x9d records.\nFurther, this official said agencies that have not done any work are mostly small agencies and\n1\n  NARA built ERA to fulfill its mission in the digital age: to safeguard and preserve the records of our government,\nensure that the people can discover, use, and learn from this documentary heritage, and ensure continuing access to\nthe essential documentation of the rights of American citizens and the actions of their government.\n\n2\n  Base ERA allows Federal agencies to perform critical records management transactions with NARA online.\nFederal agency records management staff use Base ERA to draft new records retention schedules for records in any\nformat, officially submit those schedules for approval by NARA, request the transfer of records in any format to\nNARA for accessioning or pre-accessioning, and submit electronic records for storage in the Base ERA electronic\nrecords repository.\n\n                                                   Page 3\n                                National Archives and Records Administration\n\x0c                                                                                    OIG Audit Report No. 13-11\n\n\ncommissions. Such agencies usually do not frequently schedule records or transfer permanent\nrecords, and only interact with NARA once every few years or longer. Federal agencies\nprovided several reasons for not transferring electronic records into Base ERA by themselves\nonline. The reasons included: not being ready to do so, comfort allowing NARA to ingest\nrecords on their behalf, following the guidance of NARA, having no applicable data to ingest,\nhaving records with security issues, and experiencing issues with Base ERA. However, NARA\nmanagement stated many agencies should have better records management programs and should\nbe working more frequently with NARA to increase usage of Base ERA. Thus, according to\nNARA management, the lack of work in Base ERA can be attributed to agencies\xe2\x80\x99 infrequent\nrecords management workload and/or poor records management practices.\n\nAdditionally, Base ERA\xe2\x80\x99s usefulness is limited by performance issues. Base ERA experiences\nproblems when ingesting large amounts of data. First, packages or shipments of files with a size\nof 1GB (and sometimes less) fail to transfer from agency sites to the Base ERA ingest staging\narea using the web version of Base ERA. In addition, the system fails when a user attempts to\nship a package containing 10,000 or more files. Lastly, transfer requests (which may contain\nmultiple packages) fail if the number of files/folders associated with the transfer request\napproaches or exceeds 100,000 files. NARA believes that system design limitations may be the\ncause of some of these weaknesses, but the actual cause for all of them is not known. As a\nresult, the system\xe2\x80\x99s usefulness to NARA and other Federal agencies is limited.\n\nThe system\xe2\x80\x99s issues need to be addressed for NARA and Federal agencies to use it effectively\nand efficiently as envisioned. If not addressed, these issues could worsen considerably in future\nyears as data volumes are expected to increase significantly. An outside entity reported Federal\nagencies currently store an estimated 1.6 petabytes 3 of data, and this is projected to increase to\n2.6 petabytes within the next two years. Further, NARA officials need to begin planning for an\nincrease in the size of files as well as the volume of data.\n\nFinally, our ability to fully review the ingest function of Base ERA was limited due to issues\nwith NARA\xe2\x80\x99s Base ERA reports. These issues included inaccurate data in reports, reports\ncapturing data for limited periods of time, and a lack of reports capturing the number of Federal\nagencies performing different methods of ingest.\n\nOur audit identified areas of improvement to Base ERA. We made three recommendations to\nenhance the system\xe2\x80\x99s usefulness to NARA and other Federal agencies.\n\n\n\n\n3\n 1024 gigabytes equals 1 terabyte, and 1024 terabytes equals 1 petabyte. For reference, 1 gigabyte can hold 7\nminutes of high-definition TV video while 1 petabyte can hold 13.3 years of high-definition TV video.\n                                                   Page 4\n                                National Archives and Records Administration\n\x0c                                                                          OIG Audit Report No. 13-11\n\n\nBackground\n\nThe Electronic Records Archives (ERA) is the system used by the National Archives and\nRecords Administration (NARA) to allow Federal agencies to perform critical records\nmanagement transactions online. Agency records management staff use ERA to draft new\nrecords retention schedules for records in any format, officially submit those schedules for\napproval by NARA, request the transfer of permanent records in any format to NARA for\naccessioning or pre-accessioning, and submit electronic records for storage in ERA. NARA built\nERA to fulfill its mission in the digital age: to safeguard and preserve the records of our\ngovernment, ensure that the people can discover, use, and learn from this documentary heritage,\nand ensure continuing access to the essential documentation of the rights of American citizens\nand the actions of their government.\n\nUnder the Federal Records Act, NARA is given general oversight responsibilities for records\nmanagement as well as general responsibilities for archiving. This includes the preservation of\npermanent records documenting the activities of the government. NARA oversees agency\nmanagement of temporary and permanent records used in everyday operations and ultimately\ntakes control of permanent agency records judged to be of historic value. The law requires each\nFederal agency to make and preserve records that (1) document the organization, functions,\npolicies, decisions, procedures, and essential transactions of the agency and (2) provide the\ninformation necessary to protect the legal and financial rights of the government and of persons\ndirectly affected by the agency\xe2\x80\x99s activities. Effective management of these records is critical for\nensuring that sufficient documentation is created; that agencies can efficiently locate and retrieve\nrecords needed in the daily performance of their missions; and that records of historical\nsignificance are identified, preserved, and made available to the public. Without effective\nrecords management, the records needed to document citizens\xe2\x80\x99 rights, actions for which federal\nofficials are responsible, and the historical experience of the nation will be at risk of loss,\ndeterioration, or destruction.\n\nIn August 2004, NARA awarded two firm-fixed-price contracts, totaling approximately $20 million,\nto the Harris Corporation and to the Lockheed Martin Corporation (Lockheed) for the ERA system\ndesign phase. On September 30, 2005, NARA officials awarded a cost-plus-award-fee contract to\nLockheed to develop ERA in increments, the first of which was scheduled to be completed in\nSeptember 2007. In announcing the contract award, the former Archivist of the United States\nemphasized the importance of this mission-critical system, stating \xe2\x80\x9cthe need for ERA is urgent,\nsince there is an unprecedented number of electronic records now being created by the\nGovernment\xe2\x80\x99s departments and agencies. This simply must happen\xe2\x80\xa6ERA\xe2\x80\x99s failure is not an\noption.\xe2\x80\x9d\n\nNARA officials issued a Cure Notice to Lockheed in July of 2007. In response, Lockheed admitted\nthat mistakes were made in managing the requirements baseline and the design of the system.\nSpecifically, the requirements baseline was not managed, and as requirements were decomposed and\nclarified, the baseline was not updated. The contractor also admitted that the mid-level system design\nwas not fully fleshed out and integration issues were tied to that problem.\n                                                 Page 5\n                             National Archives and Records Administration\n\x0c                                                                                       OIG Audit Report No. 13-11\n\n\n\nAs development continued into 2010, the ERA system became the subject of Office of Management\nand Budget TechStat 4 Reviews. NARA took actions to address TechStat concerns, including\naccelerating ERA\xe2\x80\x99s development process for completion by the end of FY 2011. In June 2011,\nNARA\xe2\x80\x99s newly appointed Chief Information Officer cited the TechStat Accountability Sessions as\nbeing instrumental in helping NARA assess and plan a successful path forward for ERA.\n\nThe ERA System is NARA\xe2\x80\x99s primary strategy for addressing the challenge of storing,\npreserving, and providing public access to electronic records. The total cost to develop the\nsystem was over $390 million 5. The estimated annual cost to operate and maintain the ERA\nsystem is approximately $30 million6.\n\nOne of NARA\'s primary challenges with ERA was to preserve different types of records along\nwith the processes and documentation required for each type. Therefore, ERA was designed\nusing separate subsystems, or instances, for each category of records. The initial three instances\nare the Federal Records Instance (Base ERA), deployed June 2008; the Executive Office of the\nPresident Instance (EOP), deployed December 2008; and the Congressional Records Instance\n(CRI), deployed December 2009. Two additional instances, Census Data Storage Instance\n(Census) and Classified Records Instance (Classified) were developed in FY 2011. Our review\nfocused on Base ERA, which is used to ingest and store non-classified, electronic records from\nFederal agencies.\n\nERA as a whole represents a major system acquisition at NARA both in terms of mission\ncriticality and financial resources. Further, it is the largest information technology project ever\nundertaken by NARA. The system development phase ended September 30, 2011 and ERA is\ncurrently in an Operations and Maintenance Phase. NARA informed Federal agencies that as of\nOctober 1, 2012, NARA will use ERA for scheduling records and transferring permanent\nrecords.\n\nERA is a \xe2\x80\x9csystem of systems,\xe2\x80\x9d with multiple components performing different archival functions\nand managing records governed by different legal frameworks. The actual architecture is more\ncomplicated, but Diagram 1 shows the four essential functions that are intended to be performed\nby ERA.\n\n\n\n\n4\n  TechStat Accountability Session (TechStat) is a face-to-face, evidence-based accountability review of an IT\ninvestment; it enables the Federal Government to intervene to turn around, halt or terminate IT Projects that are\nfailing or are not producing results for the American people.\n5\n    The total cost to develop the system included the Online Public Access resource.\n6\n The estimated costs include the Operations and Maintenance contract, hardware/software licenses, technology\nrefresh, and corrective and adaptive maintenance activities.\n                                                      Page 6\n                                   National Archives and Records Administration\n\x0c                                                                         OIG Audit Report No. 13-11\n\n\n\n\n                                           Diagram 1\n\n\n\n\nAgencies use the Submission function to deliver records and metadata into ERA. Electronic\nrecords are preserved and reviewed using ERA\xe2\x80\x99s Repository function. OIG issued Audit Report\n13-03, \xe2\x80\x9cAudit of the Electronic Records Archives System\'s Ability to Preserve Records\xe2\x80\x9d, which\naddressed the status and limitations of the preservation component of ERA\xe2\x80\x99s Repository\nfunction. Our current review of ERA focuses on the ingest component of Base ERA\xe2\x80\x99s\nSubmission function. Ingest encompasses the process of bringing electronic records into the\nERA System including physical transfer of electronic records into ERA. The remaining\nfunctions were not reviewed.\n\n\n\n\n                                             Page 7\n                          National Archives and Records Administration\n\x0c                                                                          OIG Audit Report No. 13-11\n\n\n\n\nObjectives, Scope, Methodology\n\nThe overall objective of this audit was to evaluate and report upon the capability of NARA\xe2\x80\x99s\nBase ERA System to ingest electronic records presently and in the near future. Specifically, we\nassessed the Base ERA system\xe2\x80\x99s current capability of ingesting electronic records and evaluated\nfuture plans for increased functionality.\n\nIn order to accomplish our objectives we performed the following:\n\n   \xef\x83\x98 Interviewed NARA staff, NARA contractors, and staff from various Federal agencies\n     who have used Base ERA;\n\n   \xef\x83\x98 sampled Federal agencies to determine whether they use the Base ERA System to ingest\n     electronic records;\n\n   \xef\x83\x98 requested and reviewed documents and reports compiled by NARA staff; and\n\n   \xef\x83\x98 reviewed applicable laws and regulations.\n\n\nOur audit work was performed at Archives II in College Park, Maryland. The audit took place\nbetween June 2012 and June 2013. We conducted this audit in accordance with generally\naccepted government auditing standards. Those standards require that we plan and perform the\naudit to obtain sufficient, appropriate evidence to provide a reasonable basis for our findings and\nconclusions based on our audit objectives. We believe that the evidence obtained provides a\nreasonable basis for our findings and conclusions based on our audit objectives.\n\n\n\n\n                                              Page 8\n                           National Archives and Records Administration\n\x0c                                                                                    OIG Audit Report No. 13-11\n\n\n\n\nMethodology to determine the amount of records in Base ERA.\nIn order to identify the amount of records in Base ERA we reviewed transfer requests (TRs) 7.\nUsing Base ERA reports produced by NARA we created Chart 1 to illustrate the number of TRs\nin Base ERA.\n                                                     Chart 1\n\n\n\n\nChart 1 also identifies TRs with electronic records as well as Non-Legacy TRs with ingested\nelectronic records in Base ERA. Non-Legacy TRs differ from Legacy TRs in that Legacy TRs\nare associated with electronic records that were migrated into Base ERA using a NARA legacy\nsystem, the Archival Preservation System (APS). Ingest of these Legacy records into Base ERA\nwas not initiated by any Federal agency; rather NARA migrated these records into Base ERA\nusing APS. The remaining Non-Legacy records represent electronic records where ingest into\nBase ERA was initiated by a Federal agency.\n\nThere are two ways to ingest electronic records into Base ERA; Direct Ingest or Proxy Ingest.\nDirect Ingest occurs when Federal agencies transmit electronic records into Base ERA using an\nelectronic method such as HTTPS or FTP 8. By contrast, Proxy Ingest occurs when NARA\n\n7\n A TR is the overall unit of work for data submitted by Federal agencies for ingest into Base ERA. A TR can be\nassociated with either paper records or electronic records.\n8\n Hypertext Transfer Protocol Secure (HTTPS) is a communications protocol for secure communication over\na computer network. File Transfer Protocol (FTP) is a standard network protocol used to transfer files from one\nhost to another host over the Internet.\n                                                   Page 9\n                                National Archives and Records Administration\n\x0c                                                                         OIG Audit Report No. 13-11\n\n\nofficials act as proxy for a transferring agency, thereby interacting with ERA as a "Proxy"\nagency, by actively entering new transfer data into Base ERA. For example, an agency may ship\nits electronic records to NARA on external media, such as CDs or hard drives, and have NARA\ningest the electronic records on behalf of that agency.\n\nWe identified a total of 15,074 TRs in Base ERA as of March 20, 2013. We also identified\n2,235 TRs with electronic records residing in Base ERA. Next, we filtered the data to exclude\nLegacy records that were migrated into Base ERA using APS. This resulted in 666 TRs. We\nthen filtered these 666 TRs by agency to identify TRs from agencies that directly ingested the\nelectronic records into Base ERA (104 TRs) versus TRs from agencies that relied on NARA to\ningest the electronic records on their behalf (562 TRs).\n\nTherefore, as reflected in Chart 1, based on the data contained within NARA\xe2\x80\x99s Base ERA\nreports, 15% of the TRs in Base ERA contain electronic records. In addition, 4% of all TRs in\nBase ERA represent Non-Legacy electronic records that have been ingested into Base ERA.\n\n\n\n\n                                             Page 10\n                          National Archives and Records Administration\n\x0c                                                                          OIG Audit Report No. 13-11\n\n\nAudit Results\n\n1. Lack of data ingested into Base ERA.\nBase ERA was deployed over five years ago. NARA Bulletin 2012-03 informed Federal\nagencies that as of October 1, 2012, NARA will use ERA for scheduling records and transferring\npermanent records. However, as of March 28, 2013 only 5.2 TB of electronic records resided in\nBase ERA. As a result, it appears NARA is not receiving a significant portion of the electronic\nrecords that contribute to the history of the United States which should be preserved and, if\napplicable, made available to the public. According to a NARA official, the lack of data in Base\nERA can be attributed to an infrequent records management workload and/or poor records\nmanagement practices by agencies.\n                                             Chart 2\n\n\n\n\nNARA\xe2\x80\x99s Weekly Operations Scorecard (Scorecard) tracks and reports the volume of electronic\nrecords residing in ERA. Chart 2 is derived from the March 28, 2013 Scorecard and shows the\nvolume of electronic records residing in three instances of ERA. By totaling the volume of\nelectronic records in these three instances, OIG identified 103.5 TB of electronic records in\nERA. As discussed earlier, this audit report focuses on Base ERA, which houses 5.2 TB of\nelectronic records.\n\nIn order to analyze the 5.2 TB of electronic records residing in Base ERA, as seen in Chart 2,\nOIG relied on NARA\xe2\x80\x99s Working Object Repository (WOR) and Managed Object Repository\n\n\n\n                                              Page 11\n                           National Archives and Records Administration\n\x0c                                                                                 OIG Audit Report No. 13-11\n\n\n(MOR) reports as of April 4, 2013 9. These reports show the total volume of electronic records in\nBase ERA. By combining information from NARA\xe2\x80\x99s ConsolidatedTRwithContainerExcel\nReport with the WOR and MOR reports, we were able to determine whether the electronic\nrecords residing in Base ERA represented Legacy records or Non-Legacy records.\n\nLegacy records in Base ERA originally resided on tape. Ingest of these Legacy records into\nBase ERA was not initiated by any Federal agency; rather NARA migrated these records into\nBase ERA using NARA\xe2\x80\x99s Legacy system, APS. The remaining Non-Legacy records represent\nelectronic records where ingest into Base ERA was initiated by a Federal agency. Chart 3 shows\na total of 2.08 TB of Legacy electronic records migrated into Base ERA versus 3.2 TB of Non-\nLegacy electronic records ingested into Base ERA.\n\n                                                    Chart 3\n\n\n\n\nThus, of the 103.5 TB universe of electronic records stored in ERA as of March 28, 2013 from\nthree of ERA\xe2\x80\x99s instances, only 3.2 TB, or 3%, of these records represent Non-Legacy electronic\nrecords that were ingested into Base ERA. We contacted NARA officials to determine the\nvolume (i.e., in TBs) of non-classified electronic records in the federal government that NARA\nwas aware of. However, these officials stated NARA does not collect this data and therefore the\nvolume is unknown. To provide some perspective from the EOP Instance, the Bush\nAdministration transferred over 79 TB of data to NARA, which was about 35 times the amount\nof electronic records transferred from the Clinton Administration. This data growth is supported\n\n\n\n9\n The Working Object Repository (WOR) is a temporary database used by Base ERA to store data during the initial\nphase of ingest processing. At the conclusion of ingest processing, this data is moved to a final and permanent\ndatabase called the Managed Object Repository (MOR).\n\n                                                  Page 12\n                               National Archives and Records Administration\n\x0c                                                                                  OIG Audit Report No. 13-11\n\n\nby survey results 10 showing data volume is growing at a rate of 30% per year in environments\nsuch as Federal agencies with 50 TB or more of data.\n\nIn addition to the lack of data ingested into Base ERA, our review also found a high percentage\nof agencies have not performed any work in Base ERA. NARA\'s lead ERA user liaison contact\nprovided information showing that as of May 1, 2013 266 agencies received ERA training. This\ninformation also identified how many agencies have or have not performed work in Base ERA.\nWe used this data to create Chart 4. In addition, we used NARA\xe2\x80\x99s WOR and MOR Reports to\nidentify agencies that have ingested electronic records via Direct Ingest or Proxy Ingest into Base\nERA.\n                                                    Chart 4\n\n\n\n\nOf the 266 agencies that have received ERA training, 52% have never performed work in Base\nERA. 17% of the 266 agencies have used Base ERA only to create a records schedule and/or\nTR. We identified 84 agencies with electronic records ingested into Base ERA. However, of\nthese 84 agencies, 82 used Proxy Ingest, whereas only four agencies performed Direct Ingest\n(two agencies performed both methods of ingest) 11. Thus, only 84 of the 266 (31%) agencies\nthat have received ERA training have used Base ERA to ingest electronic records.\n\n\n10\n     Data Growth and Virtualization Mandate New Approach to Federal Storage Management, April 12, 2011.\n11\n  OIG identified 84 agencies with electronic records ingested into Base ERA. Of these 84 agencies, 82 used Proxy\nIngest, whereas four agencies performed Direct Ingest. Two agencies performed both methods of ingest and we are\nincluding these agencies in Chart 4 only once within Direct Ingest.\n                                                   Page 13\n                                National Archives and Records Administration\n\x0c                                                                           OIG Audit Report No. 13-11\n\n\nWe asked a NARA official why 52% of the 266 Federal agencies that have received ERA\ntraining have not performed work in Base ERA. This official said the agencies that have not\ndone any work are mostly small agencies and commissions. Further, such agencies usually do\nnot frequently schedule records or transfer permanent records, and NARA\xe2\x80\x99s interactions with\nsuch agencies may be once every few years or longer. However, this official stated there are\nmany agencies that should have better records management programs and should be working\nmore frequently with NARA to increase usage of Base ERA. Thus, according to this NARA\nofficial, the lack of work in Base ERA can be attributed to an infrequent records management\nworkload and/or poor records management practices.\n\nWe contacted NARA officials to determine the volume (i.e., in TBs) of non-classified electronic\nrecords in the federal government that NARA was aware of. However, these officials stated\nNARA does not collect this data and therefore the volume is unknown. Because we relied on\nNARA\xe2\x80\x99s WOR and MOR Reports, which measure ingest activity in Base ERA by volume (i.e.,\nTB), it is difficult to determine if the amount of data ingested into the system is significant or not\nwithout knowing the volume of the universe of federal electronic records. NARA should\ninvestigate why more records have not been ingested into Base ERA and work with Federal\nagencies in order to improve their records management workload and records management\npractices.\n\nA previous audit, \xe2\x80\x9cNARA\xe2\x80\x99s Oversight of Electronic Records Management in the Federal\nGovernment\xe2\x80\x9d (OIG Audit Report No. 10-04, dated April 2, 2010) found NARA cannot\nreasonably ensure permanent electronic records are being adequately identified, maintained, and\ntransferred to NARA in accordance with Federal regulations. This report further stated that in\norder for NARA to ensure records of permanent value are transferred, NARA needs to take a\nmore active approach to reasonably ensuring the universe of electronic records, especially\npermanent electronic records, are identified and accounted for. A more assertive approach to\nidentifying and reasonably establishing the universe of electronic records will assist NARA in its\neffort to identify permanently valuable electronic records, wherever they exist, capture them, and\nmake them available to the public. We plan on conducting a follow-up review of this audit\nduring the next audit cycle to determine if a universe of federal electronic records has been\nidentified.\n\nRecommendation\n\nWe recommend NARA\xe2\x80\x99s Chief Operating Officer:\n\n   1.\t Assess Federal agency usage of Base ERA and implement a process to improve the\n       records management workload and records management practices that exist between\n       NARA and Federal agencies to ensure electronic records are being properly transferred\n       into Base ERA.\n\n\nManagement Response\n\nManagement concurred with this recommendation.\n                                               Page 14\n                            National Archives and Records Administration\n\x0c                                                                          OIG Audit Report No. 13-11\n\n\n\n2. Federal users are not directly ingesting electronic records into\n   Base ERA.\nOur review showed Direct Ingest is not being utilized extensively by Federal users of Base ERA.\nWe identified 84 agencies with electronic records ingested into Base ERA. However, of these 84\nagencies, 82 used Proxy Ingest only, whereas only four agencies had performed Direct Ingest\n(two agencies performed both methods of ingest). The reasons Federal agencies stated for not\nperforming Direct Ingest included: not being ready for Direct Ingest, comfort using Proxy Ingest,\nfollowing the guidance of NARA, having no applicable data to ingest, having records with\nsecurity issues, and experiencing issues with Base ERA. As a result, only 3.2 TB of Non-Legacy\nelectronic records have been ingested into Base ERA. In addition, NARA resources are being\nused to perform the ingest functions for other agencies.\n\nAs discussed previously, there are two ways to ingest electronic records into Base ERA; Direct\nIngest or Proxy Ingest. In order to determine which method of ingest agencies were using to\ntransfer electronic records into Base ERA we reviewed and analyzed ERA ingest reports and\ninterviewed NARA officials. We also contacted 36 individuals at 35 agencies who we identified\nas potential Base ERA users.\n\nOur discussions and analysis identified four agencies that have used Direct Ingest to transfer\nrecords into Base ERA. The remaining Non-Legacy electronic records in Base ERA were\ningested via Proxy Ingest. By filtering the data in NARA\xe2\x80\x99s WOR and MOR Reports by agency,\nwe identified 1.0 TB of electronic records ingested into Base ERA using Direct Ingest, and 2.2\nTB of electronic records ingested into Base ERA using Proxy Ingest as shown in Chart 5.\n\n                                             Chart 5\n\n\n\n\nAdditionally, we found that NARA Processing Archivists are directing agencies not to perform\nDirect Ingest. NARA staff stated this was because agencies typically do not create well-\n                                              Page 15\n                           National Archives and Records Administration\n\x0c                                                                          OIG Audit Report No. 13-11\n\n\nstructured, well-understood, \xe2\x80\x9cclean\xe2\x80\x9d records. In addition, NARA staff explained how it takes\nmanual intervention of an archivist to determine whether records are correct in terms of content\nand format so they can be properly processed and preserved.\n\nNARA officials stated that due to the complex nature of many electronic records transfers,\nwhat NARA receives is in need of significant examination and verification to ensure that it is\nwhat should be preserved. Direct ingest into ERA makes this process difficult since it was\ndesigned under the assumption that what agencies send would in fact be correct as received.\nAccording to NARA officials, when agencies perform Direct Ingest it is very difficult to "back\nthe transfer out" and do the necessary verification.\n\nSome NARA officials believed NARA should perform all of the ingest activities. One official\nstated it is easier and cleaner for NARA to perform ingest because processing archivists can view\nand organize data prior to ingest into ERA. Thus they can confirm data received from an agency\nis what was expected, and is readable.\n\nAlthough NARA staff have reasons for directing agencies not to use Direct Ingest, the intent of\nBase ERA was for agencies to perform the ingest function. Therefore, NARA needs to\ndetermine the most efficient and effective way (i.e. Direct Ingest, Proxy Ingest) to ingest\nelectronic records and convey it to the users.\n\nWe also contacted Federal agencies in order to ascertain how they are using Base ERA. Our\nsample of Federal agencies contacted was created using various sources. We contacted NARA\nofficials and asked for examples of non-NARA Base ERA users who have experience and are\nfamiliar with the ingest function. In addition, we also contacted agencies with a high number of\nTRs, as well as agencies that completed ERA system user surveys. This resulted in a list of 35\nFederal agencies comprising 58% of the TRs in Base ERA and 73% of the Non-Legacy volume\nof data in Base ERA.\n\nWe contacted 36 individuals at these 35 Federal agencies and asked them if they used Base ERA\nto ingest records, and if so, their method of ingest. Of the 35 agencies, 29 responded to our\ninquiry. We tailored our sample of agencies towards those that accounted for over half of the\nTRs in Base ERA and close to three quarters of the Non-Legacy volume of data in Base ERA in\norder to identify agencies most familiar with ERA. However, based on our analysis of the 29\nagencies\xe2\x80\x99 responses we found that only four agencies used Direct Ingest to transfer electronic\nrecords into Base ERA.\n\nThe 25 agencies that did not attempt Direct Ingest provided several reasons. The reasons\nincluded the agency: not being ready for Direct Ingest, being comfortable using Proxy Ingest,\nfollowing the guidance of NARA, having no applicable data to ingest, having records with\nsecurity issues, and experiencing issues with Base ERA.\n\nNARA\xe2\x80\x99s Agency ERA Adoption Report states, according to NARA\xe2\x80\x99s Strategic Goal 3, NARA\nwill address the challenges of electronic records in Government to ensure success in fulfilling\nNARA\xe2\x80\x99s mission in the digital era. Central to achieving this goal is the acceptance and use of\n\n                                              Page 16\n                           National Archives and Records Administration\n\x0c                                                                          OIG Audit Report No. 13-11\n\n\nERA by Federal agencies. The increased use of ERA to schedule, ingest, process, and store\nelectronic records from Federal agencies, Congress, and the Executive Office of the President\nwill result in better management of Federal records, in particular the preservation of permanent\nelectronic records.\n\nNARA should investigate this issue in order to increase the usage of Base ERA by Federal\nagencies. In addition, NARA needs to determine the most efficient and effective way to ingest\nelectronic records into Base ERA (i.e. Direct Ingest, Proxy Ingest) and convey this information\nto the Federal agencies who use the system.\n\nRecommendation\n\nWe recommend NARA\xe2\x80\x99s Chief Operating Officer:\n\n   2.\t Identify the most efficient and effective method of ingest (i.e. Direct Ingest, Proxy Ingest)\n       and require Federal agencies to follow this method when transferring electronic records\n       into Base ERA. In addition, this information should be properly disseminated to Federal\n       agencies.\n\nManagement Response\n\nManagement concurred with this recommendation.\n\n\n3. ERA System experiences performance issues.\nBase ERA experiences problems when ingesting large amounts of data. First, packages or\nshipments of files with a size of 1GB (and sometimes less) fail to transfer from agency sites to\nthe Base ERA ingest staging area using the web version of Base ERA. In addition, the system\nfails when a user attempts to ship a package containing 10,000 or more files. Lastly, TRs fail if\nthe number of files/folders associated with the TR approaches or exceeds 100,000 files. NARA\nbelieves that system design limitations may be the cause of some of these weaknesses, but the\nactual cause for all of them is not known. As a result, the system\xe2\x80\x99s usefulness to NARA and\nother Federal agencies is limited.\n\nA TR is the overall unit of work for data submitted by agencies for ingest into Base ERA. A TR\ncan consist of one or more shipments, which are a collection of data files packaged together for\nease of submission to Base ERA. The single file that results from the collection of files into a\nshipment is called a package. A package is essentially a Zip file containing the individual data\nfiles, and a manifest describing the included files.\n\nAgencies create packages for submission to ERA that are 650 MB, 1 GB, or 4 GB in size. These\nsizes allow agencies to write the package to a CD, transmit the file over the network, or write the\nfile to a DVD. The number of data files placed into any one package depends on the sizes of the\nindividual data files. If the files are small enough, and the agency chooses a large enough\n                                              Page 17\n                           National Archives and Records Administration\n\x0c                                                                                    OIG Audit Report No. 13-11\n\n\npackage size, it is possible to create packages containing tens, or even hundreds of thousands of\nfiles.\n\nAgencies rely on one of three methods to supply data to Base ERA for ingestion. Agencies can:\n\n     \xe2\x80\xa2\t Ship the data to NARA on media (e.g., CD or DVD, disc, or thumb drive),\n\n     \xe2\x80\xa2\t Use SFTP to transfer the data to a FTP site provided to the agency by NARA, or\n\n     \xe2\x80\xa2\t Use HTTPS from a web browser to transfer data to a web server location provided to the\n        agency by NARA.\n\nWhen using HTTPS, packages/shipments greater than 1GB in size fail to transfer from agency\nsites to the Base ERA ingest staging area. Because of the problems using HTTPS from a web\nbrowser to transfer data, and the overhead involved in shipping data to NARA on media, SFTP\nhas become the current method of choice for transferring data. However, the FTP client\npreferred by NARA appears to have issues when large files are transferred. The problem\nmanifests as corrupted files after the file transfer has completed. A secure FTP client needs to be\nidentified that can handle large file transfers and allow the client to restart transfers that end\nprematurely because of network problems.\n\nWhen a package approaches or exceeds 10,000 files ingest of the package typically fails. When\nthere is a failure, no indication of an error is sent to the NARA archivist who initiated ingest, nor\nis any error indicated to an administrator. Typically, the responsible archivist will eventually\nnotice the TR they submitted for ingest has not completed, usually days after the submission.\nThe archivist will then notify the ERA Help Desk to investigate the problem. In many of these\ncases, manual intervention by Help Desk staff is required to complete the ingest process.\n\nAnother issue is that the ingest process fails if the number of data files associated with the TR\napproaches or exceeds 100,000. NARA has stated this issue may be related to the 10,000 file\nproblem with individual packages, and recommended that the analysis of both issues should\nconsider this possibility.\n\nNARA believes that system design limitations may be the cause of some of these weaknesses,\nbut the actual cause for all of them is not known. This has resulted in NARA officials drafting a\nTechnical Direction Letter (TDL) titled \xe2\x80\x9cERA Base Small Fixes (Ingest Robustness)\xe2\x80\x9d that when\nissued would have the ERA operations and maintenance contractor research the cause of these\nweaknesses and correct them. However, work related to the draft TDL has been suspended until\nNARA completes a detailed analysis of race conditions 12 related to Base ERA.\n\nAs a result of these weaknesses, the system\xe2\x80\x99s usefulness to NARA and other Federal agencies is\nlimited. For example, there are over 30 TB of data in the ingest staging area which, due to the\nsize of these files, are unable to be processed through Base ERA. One of these datasets contains\n\n12\n  Race conditions are defined as a flaw in a software system where the output is not deterministic but depends on\nthe sequence or timing of other uncontrollable events.\n                                                   Page 18\n                                National Archives and Records Administration\n\x0c                                                                          OIG Audit Report No. 13-11\n\n\nover 56,000,000 files. Because manual workarounds are needed when a TR approaches\napproximately 100,000 files, about 560 manual workarounds would be needed to ingest this data.\nNARA officials stated that given what they know about how the system reacts to ingest, more\nrealistically they would need to create between 3,000 and 10,000 TRs to ingest this data. Since\nthis data has not gone through the ERA System, it is not being preserved, and is not searchable\nwithin ERA.\n\nFurther, the volume of data is expected to increase significantly in future years. Recent estimates\nfrom an IT consulting firm put the current volume of data stored at Federal agencies at 1.6\npetabytes. This volume is projected to increase to 2.6 petabytes within the next two years.\nBecause the Base ERA System is experiencing problems handling current file sizes, if not\naddressed, this weakness will continue to worsen. NARA officials need to begin planning for an\nincrease in the size of files as well as the volume of data.\n\nIn order to create a more useful Base ERA, NARA should continue the detailed analysis of race\nconditions related to Base ERA. After the conclusion of this analysis, NARA should use the\ninformation learned to create a plan to analyze and correct the issues identified in the draft TDL\ndiscussed above.\n\nRecommendation\n\nWe recommend NARA\xe2\x80\x99s Chief Operating Officer:\n\n   3.\t Work with NARA\xe2\x80\x99s Chief Information Officer to continue the detailed analysis of race\n       conditions related to Base ERA. After the conclusion of this analysis, NARA should use\n       the information learned to create a plan to either correct ingest issues effecting the Base\n       ERA System or provide alternate or improved ingest processes.\n\nManagement Response\n\nManagement concurred with this recommendation.\n\n\n\n4. Other Matters.\nERA Reporting Deficiencies\n\nOIG relied on various reports produced by NARA to gain an understanding of how Federal\nagencies are using Base ERA. However, while reviewing these reports, we found inaccurate\ndata. In addition, NARA was unable to produce reports showing important information needed\nto understand Base ERA usage. Because of these reporting deficiencies, our efforts to\nunderstand Base ERA usage were hindered.\n\n\n\n                                              Page 19\n                           National Archives and Records Administration\n\x0c                                                                           OIG Audit Report No. 13-11\n\n\nFor example, a NARA official provided us with a link to ERA related reports that are updated\nweekly. After analyzing one of these reports, Report5-6, we found some data discrepancies such\nas the total volume of electronic records for one agency was approximately 2,500 MB lower than\nthat agency\xe2\x80\x99s Non-Legacy volume of electronic records. When questioned about this NARA\nresponded they recently found duplicate data in the system. NARA fixed the problem and the\nfollowing week\xe2\x80\x99s version of Report5-6 was properly corrected.\n\nWe also informed NARA that this same issue also affected another report, the TPR-LTI Report.\nIn order to fix this report NARA needed to correct the logic of the report so that it did not double\ncount data. Again, by the following week NARA corrected the TPR-LTI Report.\n\nWe also requested reports identifying who was performing the ingest function. However, the\nreports provided by NARA only covered a period of approximately one month. NARA\ndetermined that the level of detail found in the requested report is only logged when the ERA\nsystem is set to DEBUG mode, which is usually only turned on when staff is troubleshooting a\nproblem. Therefore, the data found in the report was only captured for small periods of time.\n\nNARA was able to provide a replacement report showing a list of shipments NARA believed\nwere ingested by agencies. However, within the report NARA could not tell whether an ERA\nuser was initiating processing or just clicking a button to show files more than once. Thus, the\nreport could not accurately identify who initiated ingest, and NARA was unable to produce a\nreport accurately identifying who was performing the ingest function.\n\nFinally, we asked NARA staff how many agencies use Proxy Ingest. In response, NARA staff\nstated that information is not routinely captured in a report. Therefore, NARA staff manually\nassembled the information and determined that during FY 2012, 33 different agencies sent files\nto NARA via Proxy Ingest. However, our independent review identified 50 agencies using\nProxy Ingest during this same time period. Therefore, our ability to place reliance on assertions\nmade by NARA was diminished.\n\nThe issues discussed above involving reporting on Base ERA hindered our efforts to understand\nBase ERA.\n\n\n\n\n                                               Page 20\n                            National Archives and Records Administration\n\x0c                                                                   OIG Audit Report No. 13-11\n\n\n\n\nAppendix A \xe2\x80\x93 Acronyms and Abbreviations\n\nAPS     Archival Preservation System\n\nCD      Compact Disc\n\nCRI     Congressional Records Instance\n\nDVD     Digital Video Disc\n\nEOP     Executive Office of the President\n\nERA     Electronic Records Archives\n\nFTP     File Transfer Protocol\n\nGB      Gigabyte\n\nHTTPS   Hypertext Transfer Protocol Secure\n\nIT      Information Technology\n\nMB      Megabyte\n\nMOR     Managed Object Repository\n\nNARA    National Archives and Records Administration\n\nOIG     Office of the Inspector General\n\nSFTP    Secure File Transfer Protocol\n\nTB      Terabyte\n\nTDL     Technical Direction Letter\n\nTR      Transfer Request\n\nWOR     Working Object Repository\n\n\n\n\n                                       Page 21\n                    National Archives and Records Administration\n\x0c                                                                                                      OIG Audit Report No. 13-11 \n\n\n\n\n\nAppendix B \xe2\x80\x93 Management\xe2\x80\x99s Response to the Report\n\n\n\n\n       NAT IONAL\n      ARCH IVES\n\n\n\n\n            Date:\n                                               SEP 1 3 2013\n           To:                            James Springs, Acting Inspector General\n            From:                         David S. Ferriera, Archivist of the United States\n           Subject:                       DRAFT OIG Report 13-11, Audit of the Base ERA System\'s Ability to Ingest\n           Records\n\n\n           Thank you for the opportunity to review the subject draft report. We appreciate your time In\n           reviewing our informal comments and making some clarifying adjustments.\n\n           We concur with the three recommendations and we will address them further in our action\n           plan. If you have any questions about this response, please contact Mary Drak at 301-837-\n           1668 or at mary.drak@nara.gov.\n\n\n\n\n           DAVIDS. FERRIERO\n           Archivist of the United States\n\n\n\n\n     NAIIOI\':AI t\\RCI II Vf\\ .1u d\n     Rl CORD:, AOf\\\\INI\\lRA I IOl\\:\n\n          Sc,(ll ADIII\'III IWAO\n     COll i t, l I\'ARK ~11) .Wi\'..tO\xc2\xb7hl>lll\n             11\'1\\\'ll\'. \xe2\x80\xa2ll\'tilil\'t\'i,$\xe2\x80\xa2\xe2\x80\xa2 \'\xc2\xb7\n\n\n\n\n                                                                   Page 22 \n\n                                                National Archives and Records Administration\n\x0c                                                                           OIG Audit Report No. 13-11\n\n\n\n\nAppendix C \xe2\x80\x93 Report Distribution List\n\nArchivist of the United States (N)\n\nDeputy Archivist\n\nChief Information Officer\n\nChief Operating Officer\n\n\n\n\n                                               Page 23\n                            National Archives and Records Administration\n\x0c'